Goto

Collaborating Authors

 intersection point



Is the Hard-Label Cryptanalytic Model Extraction Really Polynomial?

Ito, Akira, Miura, Takayuki, Todo, Yosuke

arXiv.org Artificial Intelligence

Deep Neural Networks (DNNs) have attracted significant attention, and their internal models are now considered valuable intellectual assets. Extracting these internal models through access to a DNN is conceptually similar to extracting a secret key via oracle access to a block cipher. Consequently, cryptanalytic techniques, particularly differential-like attacks, have been actively explored recently. ReLU-based DNNs are the most commonly and widely deployed architectures. While early works (e.g., Crypto 2020, Eurocrypt 2024) assume access to exact output logits, which are usually invisible, more recent works (e.g., Asiacrypt 2024, Eurocrypt 2025) focus on the hard-label setting, where only the final classification result (e.g., "dog" or "car") is available to the attacker. Notably, Carlini et al. (Eurocrypt 2025) demonstrated that model extraction is feasible in polynomial time even under this restricted setting. In this paper, we first show that the assumptions underlying their attack become increasingly unrealistic as the attack-target depth grows. In practice, satisfying these assumptions requires an exponential number of queries with respect to the attack depth, implying that the attack does not always run in polynomial time. To address this critical limitation, we propose a novel attack method called CrossLayer Extraction. Instead of directly extracting the secret parameters (e.g., weights and biases) of a specific neuron, which incurs exponential cost, we exploit neuron interactions across layers to extract this information from deeper layers. This technique significantly reduces query complexity and mitigates the limitations of existing model extraction approaches.


InterKey: Cross-modal Intersection Keypoints for Global Localization on OpenStreetMap

Tran, Nguyen Hoang Khoi, Berrio, Julie Stephany, Shan, Mao, Worrall, Stewart

arXiv.org Artificial Intelligence

Reliable global localization is critical for autonomous vehicles, especially in environments where GNSS is degraded or unavailable, such as urban canyons and tunnels. Although high-definition (HD) maps provide accurate priors, the cost of data collection, map construction, and maintenance limits scalability. OpenStreetMap (OSM) offers a free and globally available alternative, but its coarse abstraction poses challenges for matching with sensor data. We propose InterKey, a cross-modal framework that leverages road intersections as distinctive landmarks for global localization. Our method constructs compact binary descriptors by jointly encoding road and building imprints from point clouds and OSM. To bridge modality gaps, we introduce discrepancy mitigation, orientation determination, and area-equalized sampling strategies, enabling robust cross-modal matching. Experiments on the KITTI dataset demonstrate that InterKey achieves state-of-the-art accuracy, outperforming recent baselines by a large margin. The framework generalizes to sensors that can produce dense structural point clouds, offering a scalable and cost-effective solution for robust vehicle localization.


Inverse Kinematics for a 6-Degree-of-Freedom Robot Manipulator Using Comprehensive Gröbner Systems

Okazaki, Takumu, Terui, Akira, Mikawa, Masahiko

arXiv.org Artificial Intelligence

We propose an effective method for solving the inverse kinematic problem of a specific model of 6-degree-of-freedom (6-DOF) robot manipulator using computer algebra. It is known that when the rotation axes of three consecutive rotational joints of a manipulator intersect at a single point, the inverse kinematics problem can be divided into determining position and orientation. We extend this method to more general manipulators in which the rotational axes of two consecutive joints intersect. This extension broadens the class of 6-DOF manipulators for which the inverse kinematics problem can be solved, and is expected to enable more efficient solutions. The inverse kinematic problem is solved using the Comprehensive Gr obner System (CGS) with joint parameters of the robot appearing as parameters in the coefficients to prevent repetitive calculations of the Gr obner bases. The effectiveness of the proposed method is shown by experiments.


InterLoc: LiDAR-based Intersection Localization using Road Segmentation with Automated Evaluation Method

Tran, Nguyen Hoang Khoi, Berrio, Julie Stephany, Shan, Mao, Ming, Zhenxing, Worrall, Stewart

arXiv.org Artificial Intelligence

-- Online localization of road intersections is beneficial for autonomous vehicle localization, mapping and motion planning. Intersections offer strong landmarks for correcting vehicle pose estimation, anchoring new sensor data in up-to-date maps, and guiding vehicle routing in road network graphs. Despite this importance, intersection localization has not been widely studied, with existing methods either ignoring the rich semantic information already computed onboard or relying on scarce, hand-labeled intersection datasets. T o close this gap, we present a novel LiDAR-based method for online vehicle-centric intersection localization. We detect the intersection candidates in a bird's eye view (BEV) representation formed by concatenating a sequence of semantic road scans. We then refine these candidates by analyzing the intersecting road branches and adjusting the intersection center point in a least-squares formulation. For evaluation, we introduce an automated pipeline that pairs localized intersection points with Open-StreetMap (OSM) intersection nodes using precise GNSS/INS ground-truth poses. Experiments on the SemanticKITTI dataset show that our method outperforms the latest learning-based baseline in accuracy and reliability. Sensitivity tests demonstrate the method's robustness to challenging segmentation errors, highlighting its applicability in the real world.


Construction Site Scaffolding Completeness Detection Based on Mask R-CNN and Hough Transform

Lin, Pei-Hsin, Lin, Jacob J., Hsieh, Shang-Hsien

arXiv.org Artificial Intelligence

Construction site scaffolding is essential for many building projects, and ensuring its safety is crucial to prevent accidents. The safety inspector must check the scaffolding's completeness and integrity, where most violations occur. The inspection process includes ensuring all the components are in the right place since workers often compromise safety for convenience and disassemble parts such as cross braces. This paper proposes a deep learning-based approach to detect the scaffolding and its cross braces using computer vision. A scaffold image dataset with annotated labels is used to train a convolutional neural network (CNN) model. With the proposed approach, we can automatically detect the completeness of cross braces from images taken at construction sites, without the need for manual inspection, saving a significant amount of time and labor costs. This non-invasive and efficient solution for detecting scaffolding completeness can help improve safety in construction sites.


Real Time Offside Detection using a Single Camera in Soccer

Desai, Shounak

arXiv.org Artificial Intelligence

Technological advancements in soccer have surged over the past decade, transforming aspects of the sport. Unlike binary rules, many soccer regulations, such as the "Offside Rule," rely on subjective interpretation rather than straightforward True or False criteria. The on-field referee holds ultimate authority in adjudicating these nuanced decisions. A significant breakthrough in soccer officiating is the Video Assistant Referee (V AR) system, leveraging a network of 20-30 cameras within stadiums to minimize human errors. V AR's operational scope typically encompasses 10-30 cameras, ensuring high decision accuracy but at a substantial cost. This report proposes an innovative approach to offside detection using a single camera, such as the broadcasting camera, to mitigate expenses associated with sophisticated technological setups.


Design of a Visual Pose Estimation Algorithm for Moon Landing

Süslü, Atakan, Kuran, Betül Rana, Söken, Halil Ersin

arXiv.org Artificial Intelligence

In order to make a pinpoint landing on the Moon, the spacecraft's navigation system must be accurate. To achieve the desired accuracy, navigational drift caused by the inertial sensors must be corrected. One way to correct this drift is to use absolute navigation solutions. In this study, a terrain absolute navigation method to estimate the spacecraft's position and attitude is proposed. This algorithm uses the position of the craters below the spacecraft for estimation. Craters seen by the camera onboard the spacecraft are detected and identified using a crater database known beforehand. In order to focus on estimation algorithms, image processing and crater matching steps are skipped. The accuracy of the algorithm and the effect of the crater number used for estimation are inspected by performing simulations.


ScissorBot: Learning Generalizable Scissor Skill for Paper Cutting via Simulation, Imitation, and Sim2Real

Lyu, Jiangran, Chen, Yuxing, Du, Tao, Zhu, Feng, Liu, Huiquan, Wang, Yizhou, Wang, He

arXiv.org Artificial Intelligence

This paper tackles the challenging robotic task of generalizable paper cutting using scissors. In this task, scissors attached to a robot arm are driven to accurately cut curves drawn on the paper, which is hung with the top edge fixed. Due to the frequent paper-scissor contact and consequent fracture, the paper features continual deformation and changing topology, which is diffult for accurate modeling. To ensure effective execution, we customize an action primitive sequence for imitation learning to constrain its action space, thus alleviating potential compounding errors. Finally, by integrating sim-to-real techniques to bridge the gap between simulation and reality, our policy can be effectively deployed on the real robot. Experimental results demonstrate that our method surpasses all baselines in both simulation and real-world benchmarks and achieves performance comparable to human operation with a single hand under the same conditions.


Realtime Dynamic Gaze Target Tracking and Depth-Level Estimation

Seraj, Esmaeil, Bhate, Harsh, Talamonti, Walter

arXiv.org Artificial Intelligence

Transparent Displays (TDs) are cutting-edge visual technologies that allow users to see digital content superimposed over physical environments with a variety of applications in dynamic Head-Up Displays (HUDs) in vehicles [1, 2, 3], augmented reality glasses [4, 5, 6], and smart windows in commercial buildings [7]. Their ability to blend digital information with the real world offers significant advancements in fields such as navigation, interactive advertising, robotics [8, 9, 10, 11], and immersive user interfaces and feedback [12, 13, 14, 15, 16]. Imagine a transparent display, such as a dynamic HUD in a vehicle, that not only shows essential metrics like speed, fuel levels, and engine status but also overlays navigational cues directly onto the road ahead, highlighting paths, directions, pedestrians, and other vehicles [2, 1, 17]. Beyond practical utilities, such dynamic HUDs could enhance the journey by identifying points of interest, e.g., service stations, or even serve as platforms for entertainment and work-related activities. However, realizing this vision introduces significant challenges, particularly in tracking the user's gaze across an ever-changing array of widgets and information layers projected onto the transparent display. Moreover, the accurate estimation of gaze depth levels is crucial, especially because of the display's transparency and the potential for the human gaze to interact with or pass through specific widgets, necessitating a system that can precisely discern the focus of a user's attention between virtual overlays and real-world objects to enhance both interactivity and safety [18]. The dynamic nature of this problem, coupled with the need for real-time processing, sets a complex problem space for effectively identifying and monitoring what the user is focusing on at any given moment.